Overview

Dataset statistics

Number of variables19
Number of observations800
Missing cells13
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory157.3 KiB
Average record size in memory201.3 B

Variable types

Numeric8
Categorical9
Boolean2

Alerts

Name has a high cardinality: 799 distinct valuesHigh cardinality
Checkup is highly overall correlated with DiseaseHigh correlation
Disease is highly overall correlated with Checkup and 3 other fieldsHigh correlation
Height is highly overall correlated with WeightHigh correlation
Weight is highly overall correlated with HeightHigh correlation
Diabetes is highly overall correlated with Exercise and 1 other fieldsHigh correlation
Mental_Health is highly overall correlated with DiseaseHigh correlation
Physical_Health is highly overall correlated with Drinking_HabitHigh correlation
Exercise is highly overall correlated with Diabetes and 1 other fieldsHigh correlation
Drinking_Habit is highly overall correlated with Physical_HealthHigh correlation
Education has 13 (1.6%) missing valuesMissing
Name is uniformly distributedUniform
PatientID has unique valuesUnique
Physical_Health has 311 (38.9%) zerosZeros

Reproduction

Analysis started2022-12-17 15:54:27.783062
Analysis finished2022-12-17 15:54:33.497049
Duration5.71 seconds
Software versionpandas-profiling vdev
Download configurationconfig.json

Variables

PatientID
Real number (ℝ)

Distinct800
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1513.9987
Minimum1001
Maximum2024
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.8 KiB
2022-12-17T15:54:33.552906image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1001
5-th percentile1050.95
Q11247.5
median1519.5
Q31777.25
95-th percentile1979.05
Maximum2024
Range1023
Interquartile range (IQR)529.75

Descriptive statistics

Standard deviation300.87463
Coefficient of variation (CV)0.19872845
Kurtosis-1.235007
Mean1513.9987
Median Absolute Deviation (MAD)265
Skewness-0.0063300739
Sum1211199
Variance90525.543
MonotonicityNot monotonic
2022-12-17T15:54:33.627401image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1167 1
 
0.1%
1879 1
 
0.1%
1780 1
 
0.1%
1237 1
 
0.1%
1656 1
 
0.1%
1222 1
 
0.1%
1543 1
 
0.1%
2015 1
 
0.1%
1845 1
 
0.1%
1855 1
 
0.1%
Other values (790) 790
98.8%
ValueCountFrequency (%)
1001 1
0.1%
1003 1
0.1%
1004 1
0.1%
1005 1
0.1%
1006 1
0.1%
1008 1
0.1%
1009 1
0.1%
1010 1
0.1%
1011 1
0.1%
1012 1
0.1%
ValueCountFrequency (%)
2024 1
0.1%
2023 1
0.1%
2022 1
0.1%
2020 1
0.1%
2019 1
0.1%
2018 1
0.1%
2017 1
0.1%
2016 1
0.1%
2015 1
0.1%
2014 1
0.1%

Name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct799
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size44.8 KiB
Mr. Gary Miller
 
2
Mrs. Stephanie Gay
 
1
Mr. Roger Rudd
 
1
Mr. Vito Ertz
 
1
Mrs. Marilyn Miller
 
1
Other values (794)
794 

Length

Max length25
Median length23
Mean length17.39375
Min length11

Characters and Unicode

Total characters13915
Distinct characters54
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique798 ?
Unique (%)99.8%

Sample

1st rowMrs. Stephanie Gay
2nd rowMr. Sherman Nero
3rd rowMr. Mark Boller
4th rowMr. David Caffee
5th rowMr. Gerald Emery

Common Values

ValueCountFrequency (%)
Mr. Gary Miller 2
 
0.2%
Mrs. Stephanie Gay 1
 
0.1%
Mr. Roger Rudd 1
 
0.1%
Mr. Vito Ertz 1
 
0.1%
Mrs. Marilyn Miller 1
 
0.1%
Mr. David Hench 1
 
0.1%
Mr. Blair Simmons 1
 
0.1%
Mr. James Luna 1
 
0.1%
Mr. Irwin Mcclure 1
 
0.1%
Mr. Todd Doster 1
 
0.1%
Other values (789) 789
98.6%

Length

2022-12-17T15:54:33.699763image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mr 564
 
23.5%
mrs 236
 
9.8%
michael 23
 
1.0%
james 22
 
0.9%
robert 20
 
0.8%
john 19
 
0.8%
david 16
 
0.7%
richard 15
 
0.6%
william 12
 
0.5%
timothy 11
 
0.5%
Other values (1028) 1462
60.9%

Most occurring characters

ValueCountFrequency (%)
1600
 
11.5%
r 1556
 
11.2%
e 983
 
7.1%
M 973
 
7.0%
a 925
 
6.6%
. 800
 
5.7%
n 718
 
5.2%
s 645
 
4.6%
i 616
 
4.4%
o 612
 
4.4%
Other values (44) 4487
32.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 9115
65.5%
Uppercase Letter 2400
 
17.2%
Space Separator 1600
 
11.5%
Other Punctuation 800
 
5.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 1556
17.1%
e 983
10.8%
a 925
10.1%
n 718
7.9%
s 645
 
7.1%
i 616
 
6.8%
o 612
 
6.7%
l 607
 
6.7%
t 353
 
3.9%
h 319
 
3.5%
Other values (16) 1781
19.5%
Uppercase Letter
ValueCountFrequency (%)
M 973
40.5%
J 136
 
5.7%
C 125
 
5.2%
R 123
 
5.1%
B 103
 
4.3%
S 101
 
4.2%
D 94
 
3.9%
L 86
 
3.6%
A 75
 
3.1%
W 74
 
3.1%
Other values (16) 510
21.2%
Space Separator
ValueCountFrequency (%)
1600
100.0%
Other Punctuation
ValueCountFrequency (%)
. 800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11515
82.8%
Common 2400
 
17.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 1556
13.5%
e 983
 
8.5%
M 973
 
8.4%
a 925
 
8.0%
n 718
 
6.2%
s 645
 
5.6%
i 616
 
5.3%
o 612
 
5.3%
l 607
 
5.3%
t 353
 
3.1%
Other values (42) 3527
30.6%
Common
ValueCountFrequency (%)
1600
66.7%
. 800
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13915
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1600
 
11.5%
r 1556
 
11.2%
e 983
 
7.1%
M 973
 
7.0%
a 925
 
6.6%
. 800
 
5.7%
n 718
 
5.2%
s 645
 
4.6%
i 616
 
4.4%
o 612
 
4.4%
Other values (44) 4487
32.2%

Birth_Year
Real number (ℝ)

Distinct50
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1966.0438
Minimum1855
Maximum1993
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.8 KiB
2022-12-17T15:54:33.764597image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1855
5-th percentile1953
Q11961
median1966
Q31974
95-th percentile1982.05
Maximum1993
Range138
Interquartile range (IQR)13

Descriptive statistics

Standard deviation15.421872
Coefficient of variation (CV)0.0078441142
Kurtosis26.559098
Mean1966.0438
Median Absolute Deviation (MAD)6
Skewness-4.2088125
Sum1572835
Variance237.83413
MonotonicityNot monotonic
2022-12-17T15:54:33.831664image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1964 57
 
7.1%
1965 47
 
5.9%
1963 38
 
4.8%
1968 37
 
4.6%
1970 31
 
3.9%
1966 31
 
3.9%
1971 31
 
3.9%
1960 29
 
3.6%
1962 29
 
3.6%
1958 27
 
3.4%
Other values (40) 443
55.4%
ValueCountFrequency (%)
1855 1
 
0.1%
1859 3
0.4%
1860 1
 
0.1%
1864 2
0.2%
1866 1
 
0.1%
1867 1
 
0.1%
1869 1
 
0.1%
1870 1
 
0.1%
1881 1
 
0.1%
1945 1
 
0.1%
ValueCountFrequency (%)
1993 4
 
0.5%
1988 4
 
0.5%
1987 12
1.5%
1985 2
 
0.2%
1984 9
 
1.1%
1983 9
 
1.1%
1982 7
 
0.9%
1981 25
3.1%
1980 22
2.8%
1979 22
2.8%

Region
Categorical

Distinct10
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size44.8 KiB
East Midlands
154 
London
136 
South West
107 
West Midlands
89 
South East
84 
Other values (5)
230 

Length

Max length24
Median length15
Mean length11.82625
Min length6

Characters and Unicode

Total characters9461
Distinct characters28
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLondon
2nd rowSouth West
3rd rowYorkshire and the Humber
4th rowLondon
5th rowSouth East

Common Values

ValueCountFrequency (%)
East Midlands 154
19.2%
London 136
17.0%
South West 107
13.4%
West Midlands 89
11.1%
South East 84
10.5%
East of England 80
10.0%
Yorkshire and the Humber 64
8.0%
North West 59
 
7.4%
North East 22
 
2.8%
LONDON 5
 
0.6%

Length

2022-12-17T15:54:33.898010image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-17T15:54:33.969416image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
east 340
20.4%
west 255
15.3%
midlands 243
14.6%
south 191
11.5%
london 141
8.5%
north 81
 
4.9%
of 80
 
4.8%
england 80
 
4.8%
yorkshire 64
 
3.8%
and 64
 
3.8%
Other values (2) 128
 
7.7%

Most occurring characters

ValueCountFrequency (%)
t 931
 
9.8%
s 902
 
9.5%
867
 
9.2%
d 766
 
8.1%
n 739
 
7.8%
a 727
 
7.7%
o 688
 
7.3%
e 447
 
4.7%
E 420
 
4.4%
h 400
 
4.2%
Other values (18) 2574
27.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7110
75.2%
Uppercase Letter 1484
 
15.7%
Space Separator 867
 
9.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 931
13.1%
s 902
12.7%
d 766
10.8%
n 739
10.4%
a 727
10.2%
o 688
9.7%
e 447
6.3%
h 400
5.6%
l 323
 
4.5%
i 307
 
4.3%
Other values (7) 880
12.4%
Uppercase Letter
ValueCountFrequency (%)
E 420
28.3%
W 255
17.2%
M 243
16.4%
S 191
12.9%
L 141
 
9.5%
N 91
 
6.1%
Y 64
 
4.3%
H 64
 
4.3%
O 10
 
0.7%
D 5
 
0.3%
Space Separator
ValueCountFrequency (%)
867
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8594
90.8%
Common 867
 
9.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 931
10.8%
s 902
10.5%
d 766
 
8.9%
n 739
 
8.6%
a 727
 
8.5%
o 688
 
8.0%
e 447
 
5.2%
E 420
 
4.9%
h 400
 
4.7%
l 323
 
3.8%
Other values (17) 2251
26.2%
Common
ValueCountFrequency (%)
867
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9461
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 931
 
9.8%
s 902
 
9.5%
867
 
9.2%
d 766
 
8.1%
n 739
 
7.8%
a 727
 
7.7%
o 688
 
7.3%
e 447
 
4.7%
E 420
 
4.4%
h 400
 
4.2%
Other values (18) 2574
27.2%

Education
Categorical

Distinct6
Distinct (%)0.8%
Missing13
Missing (%)1.6%
Memory size44.8 KiB
University Complete (3 or more years)
239 
High School Graduate
196 
Elementary School (1st to 9th grade)
183 
High School Incomplete (10th to 11th grade)
102 
University Incomplete (1 to 2 years)
37 

Length

Max length43
Median length37
Mean length33.035578
Min length20

Characters and Unicode

Total characters25999
Distinct characters35
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHigh School Incomplete (10th to 11th grade)
2nd rowHigh School Incomplete (10th to 11th grade)
3rd rowElementary School (1st to 9th grade)
4th rowUniversity Complete (3 or more years)
5th rowUniversity Incomplete (1 to 2 years)

Common Values

ValueCountFrequency (%)
University Complete (3 or more years) 239
29.9%
High School Graduate 196
24.5%
Elementary School (1st to 9th grade) 183
22.9%
High School Incomplete (10th to 11th grade) 102
12.8%
University Incomplete (1 to 2 years) 37
 
4.6%
I never attended school / Other 30
 
3.8%
(Missing) 13
 
1.6%

Length

2022-12-17T15:54:34.042652image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-17T15:54:34.104663image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
school 511
 
12.1%
to 322
 
7.6%
high 298
 
7.0%
grade 285
 
6.7%
university 276
 
6.5%
years 276
 
6.5%
3 239
 
5.6%
or 239
 
5.6%
more 239
 
5.6%
complete 239
 
5.6%
Other values (14) 1312
31.0%

Most occurring characters

ValueCountFrequency (%)
3449
 
13.3%
e 2544
 
9.8%
o 2200
 
8.5%
t 2015
 
7.8%
r 1754
 
6.7%
h 1226
 
4.7%
a 1166
 
4.5%
l 1072
 
4.1%
i 850
 
3.3%
m 800
 
3.1%
Other values (25) 8923
34.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 18439
70.9%
Space Separator 3449
 
13.3%
Uppercase Letter 1872
 
7.2%
Decimal Number 1087
 
4.2%
Open Punctuation 561
 
2.2%
Close Punctuation 561
 
2.2%
Other Punctuation 30
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2544
13.8%
o 2200
11.9%
t 2015
10.9%
r 1754
9.5%
h 1226
 
6.6%
a 1166
 
6.3%
l 1072
 
5.8%
i 850
 
4.6%
m 800
 
4.3%
s 765
 
4.1%
Other values (8) 4047
21.9%
Uppercase Letter
ValueCountFrequency (%)
S 481
25.7%
H 298
15.9%
U 276
14.7%
C 239
12.8%
G 196
10.5%
E 183
 
9.8%
I 169
 
9.0%
O 30
 
1.6%
Decimal Number
ValueCountFrequency (%)
1 526
48.4%
3 239
22.0%
9 183
 
16.8%
0 102
 
9.4%
2 37
 
3.4%
Space Separator
ValueCountFrequency (%)
3449
100.0%
Open Punctuation
ValueCountFrequency (%)
( 561
100.0%
Close Punctuation
ValueCountFrequency (%)
) 561
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 30
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20311
78.1%
Common 5688
 
21.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2544
12.5%
o 2200
 
10.8%
t 2015
 
9.9%
r 1754
 
8.6%
h 1226
 
6.0%
a 1166
 
5.7%
l 1072
 
5.3%
i 850
 
4.2%
m 800
 
3.9%
s 765
 
3.8%
Other values (16) 5919
29.1%
Common
ValueCountFrequency (%)
3449
60.6%
( 561
 
9.9%
) 561
 
9.9%
1 526
 
9.2%
3 239
 
4.2%
9 183
 
3.2%
0 102
 
1.8%
2 37
 
0.7%
/ 30
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25999
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3449
 
13.3%
e 2544
 
9.8%
o 2200
 
8.5%
t 2015
 
7.8%
r 1754
 
6.7%
h 1226
 
4.7%
a 1166
 
4.5%
l 1072
 
4.1%
i 850
 
3.3%
m 800
 
3.1%
Other values (25) 8923
34.3%

Height
Real number (ℝ)

Distinct15
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean167.80625
Minimum151
Maximum180
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.8 KiB
2022-12-17T15:54:34.171012image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum151
5-th percentile154
Q1162
median167
Q3173
95-th percentile180
Maximum180
Range29
Interquartile range (IQR)11

Descriptive statistics

Standard deviation7.9768885
Coefficient of variation (CV)0.047536301
Kurtosis-0.88631625
Mean167.80625
Median Absolute Deviation (MAD)6
Skewness-0.33443479
Sum134245
Variance63.630749
MonotonicityNot monotonic
2022-12-17T15:54:34.221183image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
167 98
12.2%
172 81
10.1%
178 74
9.2%
162 69
8.6%
174 66
8.2%
173 61
7.6%
180 57
7.1%
171 56
 
7.0%
157 51
 
6.4%
165 45
 
5.6%
Other values (5) 142
17.8%
ValueCountFrequency (%)
151 21
 
2.6%
154 26
 
3.2%
155 31
 
3.9%
157 51
6.4%
158 39
 
4.9%
162 69
8.6%
165 45
5.6%
166 25
 
3.1%
167 98
12.2%
171 56
7.0%
ValueCountFrequency (%)
180 57
7.1%
178 74
9.2%
174 66
8.2%
173 61
7.6%
172 81
10.1%
171 56
7.0%
167 98
12.2%
166 25
 
3.1%
165 45
5.6%
162 69
8.6%

Weight
Real number (ℝ)

Distinct56
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean67.8275
Minimum40
Maximum97
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.8 KiB
2022-12-17T15:54:34.288306image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile49
Q158
median68
Q377
95-th percentile88
Maximum97
Range57
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.11347
Coefficient of variation (CV)0.17859232
Kurtosis-0.72390191
Mean67.8275
Median Absolute Deviation (MAD)9
Skewness0.12617779
Sum54262
Variance146.73616
MonotonicityNot monotonic
2022-12-17T15:54:34.358437image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
70 32
 
4.0%
59 29
 
3.6%
72 26
 
3.2%
71 26
 
3.2%
61 26
 
3.2%
67 24
 
3.0%
56 24
 
3.0%
55 23
 
2.9%
69 23
 
2.9%
76 22
 
2.8%
Other values (46) 545
68.1%
ValueCountFrequency (%)
40 1
 
0.1%
41 2
 
0.2%
42 1
 
0.1%
44 1
 
0.1%
45 10
1.2%
46 5
 
0.6%
47 8
1.0%
48 8
1.0%
49 9
1.1%
50 13
1.6%
ValueCountFrequency (%)
97 3
 
0.4%
96 4
 
0.5%
95 2
 
0.2%
94 1
 
0.1%
93 4
 
0.5%
92 4
 
0.5%
90 6
0.8%
89 9
1.1%
88 12
1.5%
87 14
1.8%

Checkup
Categorical

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size44.8 KiB
More than 3 years
429 
Not sure
312 
Less than 3 years but more than 1 year
53 
Less than three months
 
6

Length

Max length38
Median length17
Mean length14.91875
Min length8

Characters and Unicode

Total characters11935
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMore than 3 years
2nd rowNot sure
3rd rowMore than 3 years
4th rowNot sure
5th rowMore than 3 years

Common Values

ValueCountFrequency (%)
More than 3 years 429
53.6%
Not sure 312
39.0%
Less than 3 years but more than 1 year 53
 
6.6%
Less than three months 6
 
0.8%

Length

2022-12-17T15:54:34.512994image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-17T15:54:34.573791image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
than 541
19.0%
more 482
17.0%
3 482
17.0%
years 482
17.0%
not 312
11.0%
sure 312
11.0%
less 59
 
2.1%
but 53
 
1.9%
1 53
 
1.9%
year 53
 
1.9%
Other values (2) 12
 
0.4%

Most occurring characters

ValueCountFrequency (%)
2041
17.1%
e 1400
11.7%
r 1335
11.2%
a 1076
9.0%
s 918
7.7%
t 918
7.7%
o 800
 
6.7%
h 553
 
4.6%
n 547
 
4.6%
y 535
 
4.5%
Other values (8) 1812
15.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 8559
71.7%
Space Separator 2041
 
17.1%
Uppercase Letter 800
 
6.7%
Decimal Number 535
 
4.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1400
16.4%
r 1335
15.6%
a 1076
12.6%
s 918
10.7%
t 918
10.7%
o 800
9.3%
h 553
 
6.5%
n 547
 
6.4%
y 535
 
6.3%
u 365
 
4.3%
Other values (2) 112
 
1.3%
Uppercase Letter
ValueCountFrequency (%)
M 429
53.6%
N 312
39.0%
L 59
 
7.4%
Decimal Number
ValueCountFrequency (%)
3 482
90.1%
1 53
 
9.9%
Space Separator
ValueCountFrequency (%)
2041
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9359
78.4%
Common 2576
 
21.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1400
15.0%
r 1335
14.3%
a 1076
11.5%
s 918
9.8%
t 918
9.8%
o 800
8.5%
h 553
 
5.9%
n 547
 
5.8%
y 535
 
5.7%
M 429
 
4.6%
Other values (5) 848
9.1%
Common
ValueCountFrequency (%)
2041
79.2%
3 482
 
18.7%
1 53
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11935
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2041
17.1%
e 1400
11.7%
r 1335
11.2%
a 1076
9.0%
s 918
7.7%
t 918
7.7%
o 800
 
6.7%
h 553
 
4.6%
n 547
 
4.6%
y 535
 
4.5%
Other values (8) 1812
15.2%

Diabetes
Categorical

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size44.8 KiB
Neither I nor my immediate family have diabetes.
392 
I have/had pregnancy diabetes or borderline diabetes
206 
I do have diabetes
144 
I don't have diabetes, but I have direct family members who have diabetes.
58 

Length

Max length74
Median length52
Mean length45.515
Min length18

Characters and Unicode

Total characters36412
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNeither I nor my immediate family have diabetes.
2nd rowNeither I nor my immediate family have diabetes.
3rd rowNeither I nor my immediate family have diabetes.
4th rowI have/had pregnancy diabetes or borderline diabetes
5th rowI have/had pregnancy diabetes or borderline diabetes

Common Values

ValueCountFrequency (%)
Neither I nor my immediate family have diabetes. 392
49.0%
I have/had pregnancy diabetes or borderline diabetes 206
25.8%
I do have diabetes 144
 
18.0%
I don't have diabetes, but I have direct family members who have diabetes. 58
 
7.2%

Length

2022-12-17T15:54:34.632774image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-17T15:54:34.695561image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
diabetes 1064
18.0%
i 858
14.5%
have 710
12.0%
family 450
7.6%
neither 392
 
6.6%
nor 392
 
6.6%
my 392
 
6.6%
immediate 392
 
6.6%
or 206
 
3.5%
borderline 206
 
3.5%
Other values (8) 846
14.3%

Most occurring characters

ValueCountFrequency (%)
e 5404
14.8%
5108
14.0%
a 3234
 
8.9%
i 2954
 
8.1%
d 2128
 
5.8%
t 2022
 
5.6%
m 1742
 
4.8%
r 1724
 
4.7%
h 1572
 
4.3%
b 1386
 
3.8%
Other values (18) 9138
25.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 29282
80.4%
Space Separator 5108
 
14.0%
Uppercase Letter 1250
 
3.4%
Other Punctuation 772
 
2.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 5404
18.5%
a 3234
11.0%
i 2954
10.1%
d 2128
 
7.3%
t 2022
 
6.9%
m 1742
 
5.9%
r 1724
 
5.9%
h 1572
 
5.4%
b 1386
 
4.7%
s 1122
 
3.8%
Other values (11) 5994
20.5%
Other Punctuation
ValueCountFrequency (%)
. 450
58.3%
/ 206
26.7%
' 58
 
7.5%
, 58
 
7.5%
Uppercase Letter
ValueCountFrequency (%)
I 858
68.6%
N 392
31.4%
Space Separator
ValueCountFrequency (%)
5108
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 30532
83.9%
Common 5880
 
16.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 5404
17.7%
a 3234
10.6%
i 2954
9.7%
d 2128
 
7.0%
t 2022
 
6.6%
m 1742
 
5.7%
r 1724
 
5.6%
h 1572
 
5.1%
b 1386
 
4.5%
s 1122
 
3.7%
Other values (13) 7244
23.7%
Common
ValueCountFrequency (%)
5108
86.9%
. 450
 
7.7%
/ 206
 
3.5%
' 58
 
1.0%
, 58
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36412
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 5404
14.8%
5108
14.0%
a 3234
 
8.9%
i 2954
 
8.1%
d 2128
 
5.8%
t 2022
 
5.6%
m 1742
 
4.8%
r 1724
 
4.7%
h 1572
 
4.3%
b 1386
 
3.8%
Other values (18) 9138
25.1%

High_Cholesterol
Real number (ℝ)

Distinct150
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean249.3225
Minimum130
Maximum568
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.8 KiB
2022-12-17T15:54:34.770531image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum130
5-th percentile179
Q1213.75
median244
Q3280
95-th percentile329.05
Maximum568
Range438
Interquartile range (IQR)66.25

Descriptive statistics

Standard deviation51.566631
Coefficient of variation (CV)0.20682702
Kurtosis4.9793639
Mean249.3225
Median Absolute Deviation (MAD)33
Skewness1.1678961
Sum199458
Variance2659.1174
MonotonicityNot monotonic
2022-12-17T15:54:34.836167image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
201 17
 
2.1%
238 16
 
2.0%
208 16
 
2.0%
181 13
 
1.6%
216 13
 
1.6%
258 13
 
1.6%
215 12
 
1.5%
286 11
 
1.4%
207 11
 
1.4%
244 11
 
1.4%
Other values (140) 667
83.4%
ValueCountFrequency (%)
130 3
0.4%
135 2
 
0.2%
145 3
0.4%
153 6
0.8%
161 4
0.5%
164 3
0.4%
168 1
 
0.1%
170 4
0.5%
171 3
0.4%
172 2
 
0.2%
ValueCountFrequency (%)
568 3
0.4%
421 2
0.2%
413 2
0.2%
411 2
0.2%
398 2
0.2%
358 3
0.4%
357 3
0.4%
346 4
0.5%
345 3
0.4%
344 2
0.2%

Blood_Pressure
Real number (ℝ)

Distinct49
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean131.05375
Minimum94
Maximum200
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.8 KiB
2022-12-17T15:54:34.904832image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum94
5-th percentile108
Q1120
median130
Q3140
95-th percentile160
Maximum200
Range106
Interquartile range (IQR)20

Descriptive statistics

Standard deviation17.052693
Coefficient of variation (CV)0.13011984
Kurtosis1.233682
Mean131.05375
Median Absolute Deviation (MAD)10
Skewness0.78683803
Sum104843
Variance290.79435
MonotonicityNot monotonic
2022-12-17T15:54:34.974253image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
120 111
 
13.9%
130 95
 
11.9%
140 86
 
10.8%
110 50
 
6.2%
150 42
 
5.2%
138 33
 
4.1%
128 33
 
4.1%
125 26
 
3.2%
132 23
 
2.9%
112 23
 
2.9%
Other values (39) 278
34.8%
ValueCountFrequency (%)
94 3
 
0.4%
100 13
 
1.6%
101 3
 
0.4%
102 5
 
0.6%
104 2
 
0.2%
105 7
 
0.9%
106 3
 
0.4%
108 15
 
1.9%
110 50
6.2%
112 23
2.9%
ValueCountFrequency (%)
200 3
 
0.4%
192 3
 
0.4%
180 5
 
0.6%
178 5
 
0.6%
174 2
 
0.2%
172 1
 
0.1%
170 12
1.5%
165 4
 
0.5%
164 3
 
0.4%
160 20
2.5%

Mental_Health
Real number (ℝ)

Distinct28
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.345
Minimum0
Maximum29
Zeros4
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size44.8 KiB
2022-12-17T15:54:35.037200image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile8
Q113
median18
Q321
95-th percentile25
Maximum29
Range29
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.3851392
Coefficient of variation (CV)0.31047214
Kurtosis-0.14623242
Mean17.345
Median Absolute Deviation (MAD)4
Skewness-0.51235392
Sum13876
Variance28.999725
MonotonicityNot monotonic
2022-12-17T15:54:35.093345image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
20 81
 
10.1%
16 68
 
8.5%
19 65
 
8.1%
23 59
 
7.4%
18 56
 
7.0%
21 52
 
6.5%
13 49
 
6.1%
22 49
 
6.1%
17 36
 
4.5%
12 33
 
4.1%
Other values (18) 252
31.5%
ValueCountFrequency (%)
0 4
 
0.5%
3 2
 
0.2%
4 3
 
0.4%
5 11
1.4%
6 3
 
0.4%
7 16
2.0%
8 15
1.9%
9 26
3.2%
10 19
2.4%
11 27
3.4%
ValueCountFrequency (%)
29 4
 
0.5%
28 6
 
0.8%
27 4
 
0.5%
26 12
 
1.5%
25 24
 
3.0%
24 31
 
3.9%
23 59
7.4%
22 49
6.1%
21 52
6.5%
20 81
10.1%

Physical_Health
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct24
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.55875
Minimum0
Maximum30
Zeros311
Zeros (%)38.9%
Negative0
Negative (%)0.0%
Memory size44.8 KiB
2022-12-17T15:54:35.151826image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median3
Q37
95-th percentile16
Maximum30
Range30
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.4491894
Coefficient of variation (CV)1.1953253
Kurtosis1.5741974
Mean4.55875
Median Absolute Deviation (MAD)3
Skewness1.345579
Sum3647
Variance29.693666
MonotonicityNot monotonic
2022-12-17T15:54:35.204912image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
0 311
38.9%
5 53
 
6.6%
2 51
 
6.4%
4 51
 
6.4%
7 40
 
5.0%
9 38
 
4.8%
6 36
 
4.5%
3 34
 
4.2%
1 30
 
3.8%
8 27
 
3.4%
Other values (14) 129
16.1%
ValueCountFrequency (%)
0 311
38.9%
1 30
 
3.8%
2 51
 
6.4%
3 34
 
4.2%
4 51
 
6.4%
5 53
 
6.6%
6 36
 
4.5%
7 40
 
5.0%
8 27
 
3.4%
9 38
 
4.8%
ValueCountFrequency (%)
30 1
 
0.1%
27 3
 
0.4%
21 4
 
0.5%
20 4
 
0.5%
19 9
1.1%
18 4
 
0.5%
17 13
1.6%
16 7
0.9%
15 10
1.2%
14 12
1.5%

Exercise
Boolean

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size39.3 KiB
False
536 
True
264 
ValueCountFrequency (%)
False 536
67.0%
True 264
33.0%
2022-12-17T15:54:35.265920image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size39.3 KiB
False
673 
True
127 
ValueCountFrequency (%)
False 673
84.1%
True 127
 
15.9%
2022-12-17T15:54:35.317767image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Drinking_Habit
Categorical

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size44.8 KiB
I usually consume alcohol every day
406 
I consider myself a social drinker
383 
I do not consume any type of alcohol
 
11

Length

Max length36
Median length35
Mean length34.535
Min length34

Characters and Unicode

Total characters27628
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowI usually consume alcohol every day
2nd rowI consider myself a social drinker
3rd rowI consider myself a social drinker
4th rowI usually consume alcohol every day
5th rowI consider myself a social drinker

Common Values

ValueCountFrequency (%)
I usually consume alcohol every day 406
50.7%
I consider myself a social drinker 383
47.9%
I do not consume any type of alcohol 11
 
1.4%

Length

2022-12-17T15:54:35.370775image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-17T15:54:35.435209image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
i 800
16.6%
consume 417
8.6%
alcohol 417
8.6%
usually 406
8.4%
every 406
8.4%
day 406
8.4%
consider 383
7.9%
myself 383
7.9%
a 383
7.9%
social 383
7.9%
Other values (6) 438
9.1%

Most occurring characters

ValueCountFrequency (%)
4022
14.6%
l 2412
 
8.7%
e 2389
 
8.6%
o 2050
 
7.4%
a 2006
 
7.3%
s 1972
 
7.1%
y 1623
 
5.9%
c 1600
 
5.8%
r 1555
 
5.6%
u 1229
 
4.4%
Other values (11) 6770
24.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 22806
82.5%
Space Separator 4022
 
14.6%
Uppercase Letter 800
 
2.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 2412
10.6%
e 2389
10.5%
o 2050
9.0%
a 2006
8.8%
s 1972
8.6%
y 1623
 
7.1%
c 1600
 
7.0%
r 1555
 
6.8%
u 1229
 
5.4%
n 1205
 
5.3%
Other values (9) 4765
20.9%
Space Separator
ValueCountFrequency (%)
4022
100.0%
Uppercase Letter
ValueCountFrequency (%)
I 800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 23606
85.4%
Common 4022
 
14.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 2412
10.2%
e 2389
10.1%
o 2050
 
8.7%
a 2006
 
8.5%
s 1972
 
8.4%
y 1623
 
6.9%
c 1600
 
6.8%
r 1555
 
6.6%
u 1229
 
5.2%
n 1205
 
5.1%
Other values (10) 5565
23.6%
Common
ValueCountFrequency (%)
4022
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27628
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4022
14.6%
l 2412
 
8.7%
e 2389
 
8.6%
o 2050
 
7.4%
a 2006
 
7.3%
s 1972
 
7.1%
y 1623
 
5.9%
c 1600
 
5.8%
r 1555
 
5.6%
u 1229
 
4.4%
Other values (11) 6770
24.5%

Fruit_Habit
Categorical

Distinct5
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size44.8 KiB
Less than 1. I do not consume fruits every day.
452 
1 to 2 pieces of fruit in average
175 
3 to 4 pieces of fruit in average
105 
5 to 6 pieces of fruit in average
56 
More than six pieces of fruit
 
12

Length

Max length47
Median length47
Mean length40.85
Min length29

Characters and Unicode

Total characters32680
Distinct characters30
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLess than 1. I do not consume fruits every day.
2nd rowLess than 1. I do not consume fruits every day.
3rd rowLess than 1. I do not consume fruits every day.
4th rowLess than 1. I do not consume fruits every day.
5th row1 to 2 pieces of fruit in average

Common Values

ValueCountFrequency (%)
Less than 1. I do not consume fruits every day. 452
56.5%
1 to 2 pieces of fruit in average 175
 
21.9%
3 to 4 pieces of fruit in average 105
 
13.1%
5 to 6 pieces of fruit in average 56
 
7.0%
More than six pieces of fruit 12
 
1.5%

Length

2022-12-17T15:54:35.492930image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-17T15:54:35.557351image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1 627
 
8.6%
than 464
 
6.4%
less 452
 
6.2%
i 452
 
6.2%
do 452
 
6.2%
not 452
 
6.2%
consume 452
 
6.2%
fruits 452
 
6.2%
every 452
 
6.2%
day 452
 
6.2%
Other values (13) 2573
35.3%

Most occurring characters

ValueCountFrequency (%)
6480
19.8%
e 3188
 
9.8%
s 2168
 
6.6%
o 2052
 
6.3%
t 2052
 
6.3%
n 1704
 
5.2%
r 1600
 
4.9%
a 1588
 
4.9%
i 1496
 
4.6%
u 1252
 
3.8%
Other values (20) 9100
27.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 23256
71.2%
Space Separator 6480
 
19.8%
Decimal Number 1124
 
3.4%
Uppercase Letter 916
 
2.8%
Other Punctuation 904
 
2.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 3188
13.7%
s 2168
9.3%
o 2052
8.8%
t 2052
8.8%
n 1704
 
7.3%
r 1600
 
6.9%
a 1588
 
6.8%
i 1496
 
6.4%
u 1252
 
5.4%
f 1148
 
4.9%
Other values (9) 5008
21.5%
Decimal Number
ValueCountFrequency (%)
1 627
55.8%
2 175
 
15.6%
3 105
 
9.3%
4 105
 
9.3%
5 56
 
5.0%
6 56
 
5.0%
Uppercase Letter
ValueCountFrequency (%)
L 452
49.3%
I 452
49.3%
M 12
 
1.3%
Space Separator
ValueCountFrequency (%)
6480
100.0%
Other Punctuation
ValueCountFrequency (%)
. 904
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 24172
74.0%
Common 8508
 
26.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 3188
13.2%
s 2168
 
9.0%
o 2052
 
8.5%
t 2052
 
8.5%
n 1704
 
7.0%
r 1600
 
6.6%
a 1588
 
6.6%
i 1496
 
6.2%
u 1252
 
5.2%
f 1148
 
4.7%
Other values (12) 5924
24.5%
Common
ValueCountFrequency (%)
6480
76.2%
. 904
 
10.6%
1 627
 
7.4%
2 175
 
2.1%
3 105
 
1.2%
4 105
 
1.2%
5 56
 
0.7%
6 56
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32680
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6480
19.8%
e 3188
 
9.8%
s 2168
 
6.6%
o 2052
 
6.3%
t 2052
 
6.3%
n 1704
 
5.2%
r 1600
 
4.9%
a 1588
 
4.9%
i 1496
 
4.6%
u 1252
 
3.8%
Other values (20) 9100
27.8%

Water_Habit
Categorical

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size44.8 KiB
Between one liter and two liters
364 
More than half a liter but less than one liter
352 
Less than half a liter
84 

Length

Max length46
Median length32
Mean length37.11
Min length22

Characters and Unicode

Total characters29688
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBetween one liter and two liters
2nd rowBetween one liter and two liters
3rd rowMore than half a liter but less than one liter
4th rowMore than half a liter but less than one liter
5th rowMore than half a liter but less than one liter

Common Values

ValueCountFrequency (%)
Between one liter and two liters 364
45.5%
More than half a liter but less than one liter 352
44.0%
Less than half a liter 84
 
10.5%

Length

2022-12-17T15:54:35.622392image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-17T15:54:35.683869image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
liter 1152
18.8%
than 788
12.9%
one 716
11.7%
half 436
 
7.1%
a 436
 
7.1%
less 436
 
7.1%
between 364
 
5.9%
and 364
 
5.9%
two 364
 
5.9%
liters 364
 
5.9%
Other values (2) 704
11.5%

Most occurring characters

ValueCountFrequency (%)
5324
17.9%
e 4112
13.9%
t 3384
11.4%
l 2304
7.8%
n 2232
7.5%
a 2024
 
6.8%
r 1868
 
6.3%
i 1516
 
5.1%
o 1432
 
4.8%
s 1236
 
4.2%
Other values (9) 4256
14.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 23564
79.4%
Space Separator 5324
 
17.9%
Uppercase Letter 800
 
2.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 4112
17.5%
t 3384
14.4%
l 2304
9.8%
n 2232
9.5%
a 2024
8.6%
r 1868
7.9%
i 1516
 
6.4%
o 1432
 
6.1%
s 1236
 
5.2%
h 1224
 
5.2%
Other values (5) 2232
9.5%
Uppercase Letter
ValueCountFrequency (%)
B 364
45.5%
M 352
44.0%
L 84
 
10.5%
Space Separator
ValueCountFrequency (%)
5324
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 24364
82.1%
Common 5324
 
17.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 4112
16.9%
t 3384
13.9%
l 2304
9.5%
n 2232
9.2%
a 2024
8.3%
r 1868
7.7%
i 1516
 
6.2%
o 1432
 
5.9%
s 1236
 
5.1%
h 1224
 
5.0%
Other values (8) 3032
12.4%
Common
ValueCountFrequency (%)
5324
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 29688
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5324
17.9%
e 4112
13.9%
t 3384
11.4%
l 2304
7.8%
n 2232
7.5%
a 2024
 
6.8%
r 1868
 
6.3%
i 1516
 
5.1%
o 1432
 
4.8%
s 1236
 
4.2%
Other values (9) 4256
14.3%

Disease
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size44.8 KiB
1
411 
0
389 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters800
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
1 411
51.4%
0 389
48.6%

Length

2022-12-17T15:54:35.760011image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-17T15:54:35.821412image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1 411
51.4%
0 389
48.6%

Most occurring characters

ValueCountFrequency (%)
1 411
51.4%
0 389
48.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 800
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 411
51.4%
0 389
48.6%

Most occurring scripts

ValueCountFrequency (%)
Common 800
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 411
51.4%
0 389
48.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 411
51.4%
0 389
48.6%

Interactions

2022-12-17T15:54:32.755606image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:28.909604image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.429986image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.955462image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.613136image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.134919image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.645506image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.158135image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.820035image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:28.980296image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.495602image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.019765image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.675945image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.199367image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.710353image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.222336image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.884921image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.047774image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.563537image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.085857image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.740388image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.265107image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.775928image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.289218image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.950755image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.112687image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.630172image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.150149image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.807584image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.330270image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.842022image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.354205image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:33.011874image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.175732image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.692566image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.212721image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.872604image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.392945image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.904072image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.417062image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:33.072851image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.236484image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.755976image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.273644image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.936028image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.455499image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.964428image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.477463image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:33.136071image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.299413image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.822604image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.338821image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.001709image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.520081image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.027174image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.541314image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:33.201244image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.365916image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:29.889021image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:30.549283image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.068039image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:31.583978image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.091984image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-12-17T15:54:32.692729image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-12-17T15:54:35.877507image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-12-17T15:54:36.001789image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-12-17T15:54:36.172729image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-12-17T15:54:36.257204image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-12-17T15:54:36.342317image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-12-17T15:54:36.430984image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-12-17T15:54:33.300252image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-12-17T15:54:33.437757image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

PatientIDNameBirth_YearRegionEducationHeightWeightCheckupDiabetesHigh_CholesterolBlood_PressureMental_HealthPhysical_HealthExerciseSmoking_HabitDrinking_HabitFruit_HabitWater_HabitDisease
01167Mrs. Stephanie Gay1965LondonHigh School Incomplete (10th to 11th grade)15567More than 3 yearsNeither I nor my immediate family have diabetes.358120212YesNoI usually consume alcohol every dayLess than 1. I do not consume fruits every day.Between one liter and two liters1
11805Mr. Sherman Nero1969South WestHigh School Incomplete (10th to 11th grade)17388Not sureNeither I nor my immediate family have diabetes.23014290YesNoI consider myself a social drinkerLess than 1. I do not consume fruits every day.Between one liter and two liters1
21557Mr. Mark Boller1974Yorkshire and the HumberElementary School (1st to 9th grade)16268More than 3 yearsNeither I nor my immediate family have diabetes.226122260NoNoI consider myself a social drinkerLess than 1. I do not consume fruits every day.More than half a liter but less than one liter1
31658Mr. David Caffee1958LondonUniversity Complete (3 or more years)18066Not sureI have/had pregnancy diabetes or borderline diabetes313125138YesNoI usually consume alcohol every dayLess than 1. I do not consume fruits every day.More than half a liter but less than one liter0
41544Mr. Gerald Emery1968South EastUniversity Incomplete (1 to 2 years)18058More than 3 yearsI have/had pregnancy diabetes or borderline diabetes277125182NoNoI consider myself a social drinker1 to 2 pieces of fruit in averageMore than half a liter but less than one liter1
51653Mr. David Lamothe1966East MidlandsNaN16749Not sureNeither I nor my immediate family have diabetes.28713077YesYesI consider myself a social drinkerLess than 1. I do not consume fruits every day.More than half a liter but less than one liter0
61422Mrs. Patricia Byrne1965Yorkshire and the HumberHigh School Graduate15863More than 3 yearsNeither I nor my immediate family have diabetes.358120212YesNoI usually consume alcohol every dayLess than 1. I do not consume fruits every day.Less than half a liter1
71806Mr. Wesley Shoemaker1965West MidlandsHigh School Graduate17867Less than 3 years but more than 1 yearNeither I nor my immediate family have diabetes.28015092YesNoI consider myself a social drinker1 to 2 pieces of fruit in averageMore than half a liter but less than one liter0
81703Mr. Billy Kirkland1965East of EnglandHigh School Graduate16263Less than 3 years but more than 1 yearNeither I nor my immediate family have diabetes.205110127YesNoI usually consume alcohol every dayLess than 1. I do not consume fruits every day.Between one liter and two liters1
91370Mrs. Tina Morris1979East MidlandsHigh School Graduate15451Not sureNeither I nor my immediate family have diabetes.3451321414YesYesI consider myself a social drinkerLess than 1. I do not consume fruits every day.Between one liter and two liters0
PatientIDNameBirth_YearRegionEducationHeightWeightCheckupDiabetesHigh_CholesterolBlood_PressureMental_HealthPhysical_HealthExerciseSmoking_HabitDrinking_HabitFruit_HabitWater_HabitDisease
7901239Mrs. Melanie Pope1960East of EnglandUniversity Complete (3 or more years)15448More than 3 yearsNeither I nor my immediate family have diabetes.248150196YesNoI usually consume alcohol every dayLess than 1. I do not consume fruits every day.Between one liter and two liters0
7911782Mrs. Kim Kling1958North WestUniversity Complete (3 or more years)15751More than 3 yearsNeither I nor my immediate family have diabetes.307130119NoNoI usually consume alcohol every day3 to 4 pieces of fruit in averageMore than half a liter but less than one liter1
7921695Mr. Michael Thomas1987East MidlandsHigh School Graduate17176Not sureNeither I nor my immediate family have diabetes.286126190YesNoI consider myself a social drinkerLess than 1. I do not consume fruits every day.More than half a liter but less than one liter0
7931590Mrs. Crystal Comeaux1948LondonUniversity Complete (3 or more years)15769More than 3 yearsI do have diabetes273120110YesNoI consider myself a social drinker1 to 2 pieces of fruit in averageLess than half a liter1
7941912Mr. Mike Jefferson1987Yorkshire and the HumberHigh School Graduate17374Not sureNeither I nor my immediate family have diabetes.202120137YesNoI usually consume alcohol every dayLess than 1. I do not consume fruits every day.Between one liter and two liters0
7951909Mr. Philip Klink1972East MidlandsHigh School Incomplete (10th to 11th grade)17861Not sureNeither I nor my immediate family have diabetes.204144124YesNoI consider myself a social drinkerLess than 1. I do not consume fruits every day.Between one liter and two liters0
7961386Mrs. Jackie Valencia1980North WestElementary School (1st to 9th grade)15761More than 3 yearsI have/had pregnancy diabetes or borderline diabetes213120230NoNoI usually consume alcohol every dayLess than 1. I do not consume fruits every day.Between one liter and two liters1
7971088Mrs. Cheryl Harris1860East MidlandsElementary School (1st to 9th grade)16748More than 3 yearsNeither I nor my immediate family have diabetes.2721402017NoNoI consider myself a social drinker3 to 4 pieces of fruit in averageMore than half a liter but less than one liter0
7981662Mr. Florencio Doherty1975East of EnglandElementary School (1st to 9th grade)16575More than 3 yearsNeither I nor my immediate family have diabetes.208112160NoNoI usually consume alcohol every dayLess than 1. I do not consume fruits every day.More than half a liter but less than one liter1
7991117Mr. Freddie Vermillion1979LondonElementary School (1st to 9th grade)17370Not sureNeither I nor my immediate family have diabetes.1811201112YesNoI consider myself a social drinkerLess than 1. I do not consume fruits every day.Less than half a liter0